Inference for Approximating Regression Models

نویسندگان

  • Emil Pitkin
  • Lawrence D. Brown
چکیده

The assumptions underlying the Ordinary Least Squares (OLS) model are regularly and sometimes severely violated. In consequence, inferential procedures presumed valid for OLS are invalidated in practice. We describe a framework that is robust to model violations, and describe the modifications to the classical inferential procedures necessary to preserve inferential validity. As the covariates are assumed to be stochastically generated ("Random-X"), the sought after criterion for coverage becomes marginal rather than conditional. We focus on slopes, mean responses, and individual future observations. For slopes and mean responses, the targets of inference are redefined by means of least squares regression at the population level. The partial slopes that that regression defines, rather than the slopes of an assumed linear model, become the population quantities of interest, and they can be estimated unbiasedly. Under this framework, we estimate the Average Treatment Effect (ATE) in Randomized Controlled Studies (RCTs), and derive an estimator more efficient than one commonly used. We express the ATE as a slope coefficient in a population regression and immediately prove unbiasedness that way. For the mean response, the conditional value of the best least squares approximation to the response surface in the population rather than the conditional value of y, is aimed to be captured. A calibration through pairs bootstrap can markedly improve such coverage. Moving to observations, we show that when attempting to cover future individual responses, a simple in-sample calibration technique that widens the empirical interval to contain $(1-\alpha)*100\%$ of the sample residuals is asymptotically valid, even in the face of gross model violations. OLS is startlingly robust to model departures when a future y needs to be covered, but nonlinearity, combined with a skewed X-distribution, can severely undermine coverage of the mean response. Our ATE estimator dominates the common estimator, and the stronger the R squared of the regression of a patient's response on covariates, treatment indicator, and interactions, the better our estimator's relative performance. By considering a regression model as a semiparametric approximation to a stochastic mechanism, and not as its description, we rest assured that a coverage guarantee is a coverage guarantee. Degree Type Dissertation Degree Name Doctor of Philosophy (PhD) Graduate Group Statistics First Advisor Lawrence D. Brown

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partially Improper Gaussian Priors for Nonparametric Logistic Regression

A \partially improper" Gaussian prior is considered for Bayesian inference in logistic regression. This includes generalized smoothing spline priors that are used for nonparametric inference about the logit, and also priors that correspond to generalized random e ect models. Necessary and su cient conditions are given for the posterior to be a proper probability measure, and bounds are given fo...

متن کامل

Analysis of the Posterior for

A \partially improper" Gaussian prior is considered for Bayesian inference in logistic regression. This includes generalized smoothing spline priors that are used for nonparametric inference about the logit, and also priors that correspond to generalized linear mixed models. Necessary and su cient conditions are given for the posterior to be a proper probability measure, and bounds are given fo...

متن کامل

Bayesian Inference for Spatial Beta Generalized Linear Mixed Models

In some applications, the response variable assumes values in the unit interval. The standard linear regression model is not appropriate for modelling this type of data because the normality assumption is not met. Alternatively, the beta regression model has been introduced to analyze such observations. A beta distribution represents a flexible density family on (0, 1) interval that covers symm...

متن کامل

Artificial intelligence-based approaches for multi-station modelling of dissolve oxygen in river

ABSTRACT: In this study, adaptive neuro-fuzzy inference system, and feed forward neural network as two artificial intelligence-based models along with conventional multiple linear regression model were used to predict the multi-station modelling of dissolve oxygen concentration at the downstream of Mathura City in India. The data used are dissolved oxygen, pH, biological oxygen demand and water...

متن کامل

Prediction of soil cation exchange capacity using support vector regression optimized by genetic algorithm and adaptive network-based fuzzy inference system

Soil cation exchange capacity (CEC) is a parameter that represents soil fertility. Being difficult to measure, pedotransfer functions (PTFs) can be routinely applied for prediction of CEC by soil physicochemical properties that can be easily measured. This study developed the support vector regression (SVR) combined with genetic algorithm (GA) together with the adaptive network-based fuzzy infe...

متن کامل

Uncertainty Quality | Uncertainty in Deep Learning

In this chapter we assess the techniques developed in the previous chapters, concentrating on questions such as what our model uncertainty looks like. We experiment with different model architectures and approximating distributions, and use various regression and classification settings. Assessing the models’ confidence quantitatively we can see how much we sacrifice in our attempt at deriving ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016